Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

نویسندگان

  • Takuya Takagi
  • Shunsuke Inenaga
  • Kunihiko Sadakane
  • Hiroki Arimura
چکیده

In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in n log σ+O(k log n) bits of space and supports fast pattern matching queries and updates, where σ is the size of an alphabet. Assume that α = log σ n letters are packed in a single machine word on the standard word RAM model, and let f(k, n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1, n] in O(k log n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in O(m α f(k, n)) worst-case time and in O(m α + f(k, n)) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. As an application of our packed c-trie, we show that the sparse suffix tree for a string of length n over prefix codes with k sampled positions, such as evenly-spaced and word delimited sparse suffix trees, can be constructed online in O((n α + k)f(k, n)) worst-case time and O(n α + kf(k, n)) expected time with n log σ+O(k log n) bits of space. When k = O(n α ), by using the state-of-the-art dynamic predecessor/successor data structures, we obtain sub-linear time construction algorithms using only O(n α ) bits of space in both cases. We also discuss an application of our packed c-tries to online LZD factorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

String Processing Algorithms

The thesis describes extensive studies on various algorithms for efficient string processing. Data available in/via computers are often of enormous size, and thus, it is significantly important and necessary to invent timeand space-efficient methods to process them. Most of such data are, in fact, stored and manipulated as strings. String matching is most fundamental in string processing, where...

متن کامل

Applications of Succinct Dynamic Compact Tries to Some String Problems

The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. In this paper, we report our recent work on succinct dynamic compact tries that stores a set of strings of total length n in O(n log σ) space supporting pattern matching and insert/delete operations in O((|P |/α)f(n)) time, where P is a pattern string, α = Θ(logσ n), and f(n) = O((log logn) ...

متن کامل

Deterministic Indexing for Packed Strings

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, g...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Faster Dynamic Compact Tries with Applications to Sparse Suffix Tree Construction and Other String Problems

The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. Jansson, Sadakane, and Sung (LNCS 4855, pp.424-435, FSTTCS 2007) presented the dynamic uncompacted trie data structure of n nodes in O(n log σ) space supporting pattern matching in O((|P |/α)f(n)) time and insert/delete operations in O(f(n)) time, where f(n) = ((log logn)/log log logn) is th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 100-A  شماره 

صفحات  -

تاریخ انتشار 2016